Skip to content

[codex] Trace cancelled inference streams#19839

Merged
cassirer-openai merged 4 commits into
mainfrom
codex/rollout-trace-aborted-inference
Apr 27, 2026
Merged

[codex] Trace cancelled inference streams#19839
cassirer-openai merged 4 commits into
mainfrom
codex/rollout-trace-aborted-inference

Conversation

@cassirer-openai
Copy link
Copy Markdown
Contributor

Records cancelled inference streams when Codex stops consuming a provider response before response.completed, preserving complete output items observed before cancellation.

Also closes still-running inference calls when the owning turn ends, so reduced rollout traces do not leave stale Running inference nodes.

Covered by focused reducer coverage and a core stream-drop test for partial output preservation.

@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-aborted-inference branch 4 times, most recently from 35a8e4f to 31de9a8 Compare April 27, 2026 18:34
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-aborted-inference branch from 31de9a8 to c5166ee Compare April 27, 2026 18:40
@cassirer-openai cassirer-openai marked this pull request as ready for review April 27, 2026 19:08
@cassirer-openai cassirer-openai requested a review from a team as a code owner April 27, 2026 19:08
@jif-oai
Copy link
Copy Markdown
Collaborator

jif-oai commented Apr 27, 2026

@codex review

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if tx_event
.send(Ok(ResponseEvent::OutputItemDone(item)))
.await
.is_err()
{
return;

P1 Badge Emit cancellation when downstream event channel closes

In map_response_stream, a closed tx_event causes an early return (e.g. after OutputItemDone) before any terminal trace call. If the consumer drops between select! and send, the attempt never records InferenceCancelled, so collected items_added are lost and the inference is only force-closed later at turn end without partial output evidence.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Collaborator

@jif-oai jif-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the mapper own one “finish once” path for every non-completed stream exit? and keep the reducer fallback as a last-resort repair only?

while let Some(event) = api_stream.next().await {
loop {
let event = tokio::select! {
_ = consumer_dropped.cancelled() => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only records cancellation while the mapper is waiting on api_stream.next(). If the consumer is dropped while we’re blocked in one of the tx_event.send(...).await paths, we loose it. So an interrupted stream can lose what we're trying to do here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch. The token only covers the api_stream.next() wait. I added explicit cancellation recording on the tx_event.send(...).await error paths too, so if the receiver is dropped while a send is blocked we still close the inference as cancelled and keep the items observed so far.

@cassirer-openai
Copy link
Copy Markdown
Contributor Author

cassirer-openai commented Apr 27, 2026

Can we make the mapper own one “finish once” path for every non-completed stream exit? and keep the reducer fallback as a last-resort repair only?

Yes. I updated the stream mapper so current streams should now close themselves explicitly instead of depending on reducer repair.

map_response_stream now records a terminal trace event for each non-completed exit it can observe:

API/provider stream error records InferenceFailed, carrying any completed output items observed before the error.
Downstream consumer drop / closed response channel records InferenceCancelled, also carrying any completed output items observed before the drop.
Provider stream EOF before response.completed records InferenceFailed.
The attempt now has a terminal guard, so whichever path wins records exactly once. The reducer turn-end closure remains only as a last-resort repair for malformed/older traces where an inference is still Running when the owning turn ends.

Copy link
Copy Markdown
Collaborator

@jif-oai jif-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after this

Comment thread codex-rs/rollout-trace/src/reducer/mod.rs
@cassirer-openai cassirer-openai enabled auto-merge (squash) April 27, 2026 21:32
@cassirer-openai cassirer-openai merged commit 30c5c76 into main Apr 27, 2026
25 checks passed
@cassirer-openai cassirer-openai deleted the codex/rollout-trace-aborted-inference branch April 27, 2026 21:58
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants